In this lab, we will create a movie recommendation system based on the MovieLens dataset available here. The data consists of movies ratings (on a scale of 1 to 5). Specifically, we'll be using matrix factorization to learn user and movie embeddings. Concepts highlighted here are also available in the course on Recommendation Systems.
# Ensure the right version of Tensorflow is installed.
!pip freeze | grep tensorflow==2.6
from __future__ import print_function
import numpy as np
import pandas as pd
import collections
from mpl_toolkits.mplot3d import Axes3D
from IPython import display
from matplotlib import pyplot as plt
import sklearn
import sklearn.manifold
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
# Add some convenience functions to Pandas DataFrame.
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.3f}'.format
def mask(df, key, function):
"""Returns a filtered dataframe, by applying function to key"""
return df[function(df[key])]
def flatten_cols(df):
df.columns = [' '.join(col).strip() for col in df.columns.values]
return df
pd.DataFrame.mask = mask
pd.DataFrame.flatten_cols = flatten_cols
#Let's install Altair for interactive visualizations
!pip install git+git://github.com/altair-viz/altair.git
import altair as alt
alt.data_transformers.enable('default', max_rows=None)
#alt.renderers.enable('colab')
Collecting git+git://github.com/altair-viz/altair.git
Cloning git://github.com/altair-viz/altair.git to /tmp/pip-req-build-af3ot1tu
Running command git clone -q git://github.com/altair-viz/altair.git /tmp/pip-req-build-af3ot1tu
Resolved git://github.com/altair-viz/altair.git to commit 8ff06cfbbfb470aba6798999c61128be398c4996
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing wheel metadata ... done
Requirement already satisfied: pandas>=0.18 in /opt/conda/lib/python3.7/site-packages (from altair==4.2.0.dev0) (1.3.2)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.7/site-packages (from altair==4.2.0.dev0) (2.11.3)
Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from altair==4.2.0.dev0) (1.19.5)
Requirement already satisfied: entrypoints in /opt/conda/lib/python3.7/site-packages (from altair==4.2.0.dev0) (0.3)
Requirement already satisfied: jsonschema in /opt/conda/lib/python3.7/site-packages (from altair==4.2.0.dev0) (3.2.0)
Collecting toolz
Downloading toolz-0.11.1-py3-none-any.whl (55 kB)
|████████████████████████████████| 55 kB 3.6 MB/s eta 0:00:011
Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.7/site-packages (from pandas>=0.18->altair==4.2.0.dev0) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas>=0.18->altair==4.2.0.dev0) (2021.1)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas>=0.18->altair==4.2.0.dev0) (1.16.0)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.7/site-packages (from jinja2->altair==4.2.0.dev0) (1.1.1)
Requirement already satisfied: pyrsistent>=0.14.0 in /opt/conda/lib/python3.7/site-packages (from jsonschema->altair==4.2.0.dev0) (0.17.3)
Requirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from jsonschema->altair==4.2.0.dev0) (4.8.1)
Requirement already satisfied: attrs>=17.4.0 in /opt/conda/lib/python3.7/site-packages (from jsonschema->altair==4.2.0.dev0) (21.2.0)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from jsonschema->altair==4.2.0.dev0) (57.4.0)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->jsonschema->altair==4.2.0.dev0) (3.5.0)
Requirement already satisfied: typing-extensions>=3.6.4 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->jsonschema->altair==4.2.0.dev0) (3.10.0.0)
Building wheels for collected packages: altair
Building wheel for altair (PEP 517) ... done
Created wheel for altair: filename=altair-4.2.0.dev0-py3-none-any.whl size=732855 sha256=cedd63c084047f730afe86766487c93fccd50b25550daeca6dd49684bca7ac3d
Stored in directory: /tmp/pip-ephem-wheel-cache-5_cq_33l/wheels/06/13/e0/5bd72c969fe3954ee1561739e5c58e2ddfe5c10fcdffb12faa
Successfully built altair
Installing collected packages: toolz, altair
Successfully installed altair-4.2.0.dev0 toolz-0.11.1
DataTransformerRegistry.enable('default')
We then download the MovieLens Data, and create DataFrames containing movies, users, and ratings.
# Download MovieLens data.
print("Downloading movielens data...")
from urllib.request import urlretrieve
import zipfile
urlretrieve("http://files.grouplens.org/datasets/movielens/ml-100k.zip", "movielens.zip")
zip_ref = zipfile.ZipFile('movielens.zip', "r")
zip_ref.extractall()
print("Done. Dataset contains:")
print(zip_ref.read('ml-100k/u.info'))
Downloading movielens data... Done. Dataset contains: b'943 users\n1682 items\n100000 ratings\n'
# Load each data set (users, ratings, and movies).
users_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
users = pd.read_csv(
'ml-100k/u.user', sep='|', names=users_cols, encoding='latin-1')
ratings_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
ratings = pd.read_csv(
'ml-100k/u.data', sep='\t', names=ratings_cols, encoding='latin-1')
# The movies file contains a binary feature for each genre.
genre_cols = [
"genre_unknown", "Action", "Adventure", "Animation", "Children", "Comedy",
"Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror",
"Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"
]
movies_cols = [
'movie_id', 'title', 'release_date', "video_release_date", "imdb_url"
] + genre_cols
movies = pd.read_csv(
'ml-100k/u.item', sep='|', names=movies_cols, encoding='latin-1')
# Since the ids start at 1, we shift them to start at 0. This will make handling of the
# indices easier later
users["user_id"] = users["user_id"].apply(lambda x: str(x-1))
movies["movie_id"] = movies["movie_id"].apply(lambda x: str(x-1))
movies["year"] = movies['release_date'].apply(lambda x: str(x).split('-')[-1])
ratings["movie_id"] = ratings["movie_id"].apply(lambda x: str(x-1))
ratings["user_id"] = ratings["user_id"].apply(lambda x: str(x-1))
ratings["rating"] = ratings["rating"].apply(lambda x: float(x))
# Compute the number of movies to which a genre is assigned.
genre_occurences = movies[genre_cols].sum().to_dict()
# Since some movies can belong to more than one genre, we create different
# 'genre' columns as follows:
# - all_genres: all the active genres of the movie.
# - genre: randomly sampled from the active genres.
def mark_genres(movies, genres):
def get_random_genre(gs):
active = [genre for genre, g in zip(genres, gs) if g==1]
if len(active) == 0:
return 'Other'
return np.random.choice(active)
def get_all_genres(gs):
active = [genre for genre, g in zip(genres, gs) if g==1]
if len(active) == 0:
return 'Other'
return '-'.join(active)
movies['genre'] = [
get_random_genre(gs) for gs in zip(*[movies[genre] for genre in genres])]
movies['all_genres'] = [
get_all_genres(gs) for gs in zip(*[movies[genre] for genre in genres])]
mark_genres(movies, genre_cols)
# Create one merged DataFrame containing all the movielens data.
movielens = ratings.merge(movies, on='movie_id').merge(users, on='user_id')
# Utility to split the data into training and test sets.
def split_dataframe(df, holdout_fraction=0.1):
"""Splits a DataFrame into training and test sets.
Args:
df: a dataframe.
holdout_fraction: fraction of dataframe rows to use in the test set.
Returns:
train: dataframe for training
test: dataframe for testing
"""
test = df.sample(frac=holdout_fraction, replace=False)
train = df[~df.index.isin(test.index)]
return train, test
Before we dive into model building, let's inspect our MovieLens dataset. It is usually helpful to understand the statistics of the dataset.
We start by printing some basic statistics describing the numeric user features.
users.describe()
| age | |
|---|---|
| count | 943.000 |
| mean | 34.052 |
| std | 12.193 |
| min | 7.000 |
| 25% | 25.000 |
| 50% | 31.000 |
| 75% | 43.000 |
| max | 73.000 |
We can also print some basic statistics describing the categorical user features
users.describe(include=[np.object])
| user_id | sex | occupation | zip_code | |
|---|---|---|---|---|
| count | 943 | 943 | 943 | 943 |
| unique | 943 | 2 | 21 | 795 |
| top | 0 | M | student | 55414 |
| freq | 1 | 670 | 196 | 9 |
We can also create histograms to further understand the distribution of the users. We use Altair to create an interactive chart.
# The following functions are used to generate interactive Altair charts.
# We will display histograms of the data, sliced by a given attribute.
# Create filters to be used to slice the data.
occupation_filter = alt.selection_multi(fields=["occupation"])
occupation_chart = alt.Chart().mark_bar().encode(
x="count()",
y=alt.Y("occupation:N"),
color=alt.condition(
occupation_filter,
alt.Color("occupation:N", scale=alt.Scale(scheme='category20')),
alt.value("lightgray")),
).properties(width=300, height=300, selection=occupation_filter)
# A function that generates a histogram of filtered data.
def filtered_hist(field, label, filter):
"""Creates a layered chart of histograms.
The first layer (light gray) contains the histogram of the full data, and the
second contains the histogram of the filtered data.
Args:
field: the field for which to generate the histogram.
label: String label of the histogram.
filter: an alt.Selection object to be used to filter the data.
"""
base = alt.Chart().mark_bar().encode(
x=alt.X(field, bin=alt.Bin(maxbins=10), title=label),
y="count()",
).properties(
width=300,
)
return alt.layer(
base.transform_filter(filter),
base.encode(color=alt.value('lightgray'), opacity=alt.value(.7)),
).resolve_scale(y='independent')
Next, we look at the distribution of ratings per user. Clicking on an occupation in the right chart will filter the data by that occupation. The corresponding histogram is shown in blue, and superimposed with the histogram for the whole data (in light gray). You can use SHIFT+click to select multiple subsets.
What do you observe, and how might this affect the recommendations?
users_ratings = (
ratings
.groupby('user_id', as_index=False)
.agg({'rating': ['count', 'mean']})
.flatten_cols()
.merge(users, on='user_id')
)
# Create a chart for the count, and one for the mean.
alt.hconcat(
filtered_hist('rating count', '# ratings / user', occupation_filter),
filtered_hist('rating mean', 'mean user rating', occupation_filter),
occupation_chart,
data=users_ratings)
It is also useful to look at information about the movies and their ratings.
movies_ratings = movies.merge(
ratings
.groupby('movie_id', as_index=False)
.agg({'rating': ['count', 'mean']})
.flatten_cols(),
on='movie_id')
genre_filter = alt.selection_multi(fields=['genre'])
genre_chart = alt.Chart().mark_bar().encode(
x="count()",
y=alt.Y('genre'),
color=alt.condition(
genre_filter,
alt.Color("genre:N"),
alt.value('lightgray'))
).properties(height=300, selection=genre_filter)
(movies_ratings[['title', 'rating count', 'rating mean']]
.sort_values('rating count', ascending=False)
.head(10))
| title | rating count | rating mean | |
|---|---|---|---|
| 49 | Star Wars (1977) | 583 | 4.358 |
| 257 | Contact (1997) | 509 | 3.804 |
| 99 | Fargo (1996) | 508 | 4.156 |
| 180 | Return of the Jedi (1983) | 507 | 4.008 |
| 293 | Liar Liar (1997) | 485 | 3.157 |
| 285 | English Patient, The (1996) | 481 | 3.657 |
| 287 | Scream (1996) | 478 | 3.441 |
| 0 | Toy Story (1995) | 452 | 3.878 |
| 299 | Air Force One (1997) | 431 | 3.631 |
| 120 | Independence Day (ID4) (1996) | 429 | 3.438 |
(movies_ratings[['title', 'rating count', 'rating mean']]
.mask('rating count', lambda x: x > 20)
.sort_values('rating mean', ascending=False)
.head(10))
| title | rating count | rating mean | |
|---|---|---|---|
| 407 | Close Shave, A (1995) | 112 | 4.491 |
| 317 | Schindler's List (1993) | 298 | 4.466 |
| 168 | Wrong Trousers, The (1993) | 118 | 4.466 |
| 482 | Casablanca (1942) | 243 | 4.457 |
| 113 | Wallace & Gromit: The Best of Aardman Animatio... | 67 | 4.448 |
| 63 | Shawshank Redemption, The (1994) | 283 | 4.445 |
| 602 | Rear Window (1954) | 209 | 4.388 |
| 11 | Usual Suspects, The (1995) | 267 | 4.386 |
| 49 | Star Wars (1977) | 583 | 4.358 |
| 177 | 12 Angry Men (1957) | 125 | 4.344 |
Finally, the last chart shows the distribution of the number of ratings and average rating.
# Display the number of ratings and average rating per movie.
alt.hconcat(
filtered_hist('rating count', '# ratings / movie', genre_filter),
filtered_hist('rating mean', 'mean movie rating', genre_filter),
genre_chart,
data=movies_ratings)
Our goal is to factorize the ratings matrix $A$ into the product of a user embedding matrix $U$ and movie embedding matrix $V$, such that $A \approx UV^\top$ with $U = \begin{bmatrix} u_{1} \\ \hline \vdots \\ \hline u_{N} \end{bmatrix}$ and $V = \begin{bmatrix} v_{1} \\ \hline \vdots \\ \hline v_{M} \end{bmatrix}$.
Here
The rating matrix could be very large and, in general, most of the entries are unobserved, since a given user will only rate a small subset of movies. For effcient representation, we will use a tf.SparseTensor. A SparseTensor uses three tensors to represent the matrix: tf.SparseTensor(indices, values, dense_shape) represents a tensor, where a value $A_{ij} = a$ is encoded by setting indices[k] = [i, j] and values[k] = a. The last tensor dense_shape is used to specify the shape of the full underlying matrix.
Assume we have $2$ users and $4$ movies. Our toy ratings dataframe has three ratings,
| user_id | movie_id | rating |
|---|---|---|
| 0 | 0 | 5.0 |
| 0 | 1 | 3.0 |
| 1 | 3 | 1.0 |
The corresponding rating matrix is
$$ A = \begin{bmatrix} 5.0 & 3.0 & 0 & 0 \\ 0 & 0 & 0 & 1.0 \end{bmatrix} $$And the SparseTensor representation is,
SparseTensor(
indices=[[0, 0], [0, 1], [1,3]],
values=[5.0, 3.0, 1.0],
dense_shape=[2, 4])
In this exercise, we'll write a function that maps from our ratings DataFrame to a tf.SparseTensor.
Hint: you can select the values of a given column of a Dataframe df using df['column_name'].values.
#Solution
def build_rating_sparse_tensor(ratings_df):
"""
Args:
ratings_df: a pd.DataFrame with `user_id`, `movie_id` and `rating` columns.
Returns:
a tf.SparseTensor representing the ratings matrix.
"""
indices = ratings_df[['user_id', 'movie_id']].values
values = ratings_df['rating'].values
return tf.SparseTensor(
indices=indices,
values=values,
dense_shape=[users.shape[0], movies.shape[0]])
The model approximates the ratings matrix $A$ by a low-rank product $UV^\top$. We need a way to measure the approximation error. We'll start by using the Mean Squared Error of observed entries only (we will revisit this later). It is defined as
$$ \begin{align*} \text{MSE}(A, UV^\top) &= \frac{1}{|\Omega|}\sum_{(i, j) \in\Omega}{( A_{ij} - (UV^\top)_{ij})^2} \\ &= \frac{1}{|\Omega|}\sum_{(i, j) \in\Omega}{( A_{ij} - \langle U_i, V_j\rangle)^2} \end{align*} $$where $\Omega$ is the set of observed ratings, and $|\Omega|$ is the cardinality of $\Omega$.
Write a TensorFlow function that takes a sparse rating matrix $A$ and the two embedding matrices $U, V$ and returns the mean squared error $\text{MSE}(A, UV^\top)$.
Hints:
SparseTensor sp_x is a tuple of three Tensors: sp_x.indices, sp_x.values and sp_x.dense_shape.tf.gather_nd and tf.losses.mean_squared_error helpful.#Solution
def sparse_mean_square_error(sparse_ratings, user_embeddings, movie_embeddings):
"""
Args:
sparse_ratings: A SparseTensor rating matrix, of dense_shape [N, M]
user_embeddings: A dense Tensor U of shape [N, k] where k is the embedding
dimension, such that U_i is the embedding of user i.
movie_embeddings: A dense Tensor V of shape [M, k] where k is the embedding
dimension, such that V_j is the embedding of movie j.
Returns:
A scalar Tensor representing the MSE between the true ratings and the
model's predictions.
"""
predictions = tf.gather_nd(
tf.matmul(user_embeddings, movie_embeddings, transpose_b=True),
sparse_ratings.indices)
loss = tf.losses.mean_squared_error(sparse_ratings.values, predictions)
return loss
Note: One approach is to compute the full prediction matrix $UV^\top$, then gather the entries corresponding to the observed pairs. The memory cost of this approach is $O(NM)$. For the MovieLens dataset, this is fine, as the dense $N \times M$ matrix is small enough to fit in memory ($N = 943$, $M = 1682$).
Another approach (given in the alternate solution below) is to only gather the embeddings of the observed pairs, then compute their dot products. The memory cost is $O(|\Omega| d)$ where $d$ is the embedding dimension. In our case, $|\Omega| = 10^5$, and the embedding dimension is on the order of $10$, so the memory cost of both methods is comparable. But when the number of users or movies is much larger, the first approach becomes infeasible.
#Alternate Solution
def sparse_mean_square_error(sparse_ratings, user_embeddings, movie_embeddings):
"""
Args:
sparse_ratings: A SparseTensor rating matrix, of dense_shape [N, M]
user_embeddings: A dense Tensor U of shape [N, k] where k is the embedding
dimension, such that U_i is the embedding of user i.
movie_embeddings: A dense Tensor V of shape [M, k] where k is the embedding
dimension, such that V_j is the embedding of movie j.
Returns:
A scalar Tensor representing the MSE between the true ratings and the
model's predictions.
"""
predictions = tf.reduce_sum(
tf.gather(user_embeddings, sparse_ratings.indices[:, 0]) *
tf.gather(movie_embeddings, sparse_ratings.indices[:, 1]),
axis=1)
loss = tf.losses.mean_squared_error(sparse_ratings.values, predictions)
return loss
This is a simple class to train a matrix factorization model using stochastic gradient descent.
The class constructor takes
tf.Variable).tf.Variable).tf.Tensor).After training, one can access the trained embeddings using the model.embeddings dictionary.
Example usage:
U_var = ...
V_var = ...
loss = ...
model = CFModel(U_var, V_var, loss)
model.train(iterations=100, learning_rate=1.0)
user_embeddings = model.embeddings['user_id']
movie_embeddings = model.embeddings['movie_id']
class CFModel(object):
"""Simple class that represents a collaborative filtering model"""
def __init__(self, embedding_vars, loss, metrics=None):
"""Initializes a CFModel.
Args:
embedding_vars: A dictionary of tf.Variables.
loss: A float Tensor. The loss to optimize.
metrics: optional list of dictionaries of Tensors. The metrics in each
dictionary will be plotted in a separate figure during training.
"""
self._embedding_vars = embedding_vars
self._loss = loss
self._metrics = metrics
self._embeddings = {k: None for k in embedding_vars}
self._session = None
@property
def embeddings(self):
"""The embeddings dictionary."""
return self._embeddings
def train(self, num_iterations=100, learning_rate=1.0, plot_results=True,
optimizer=tf.train.GradientDescentOptimizer):
"""Trains the model.
Args:
iterations: number of iterations to run.
learning_rate: optimizer learning rate.
plot_results: whether to plot the results at the end of training.
optimizer: the optimizer to use. Default to GradientDescentOptimizer.
Returns:
The metrics dictionary evaluated at the last iteration.
"""
with self._loss.graph.as_default():
opt = optimizer(learning_rate)
train_op = opt.minimize(self._loss)
local_init_op = tf.group(
tf.variables_initializer(opt.variables()),
tf.local_variables_initializer())
if self._session is None:
self._session = tf.Session()
with self._session.as_default():
self._session.run(tf.global_variables_initializer())
self._session.run(tf.tables_initializer())
#tf.train.start_queue_runners()
with self._session.as_default():
local_init_op.run()
iterations = []
metrics = self._metrics or ({},)
metrics_vals = [collections.defaultdict(list) for _ in self._metrics]
# Train and append results.
for i in range(num_iterations + 1):
_, results = self._session.run((train_op, metrics))
if (i % 10 == 0) or i == num_iterations:
print("\r iteration %d: " % i + ", ".join(
["%s=%f" % (k, v) for r in results for k, v in r.items()]),
end='')
iterations.append(i)
for metric_val, result in zip(metrics_vals, results):
for k, v in result.items():
metric_val[k].append(v)
for k, v in self._embedding_vars.items():
self._embeddings[k] = v.eval()
if plot_results:
# Plot the metrics.
num_subplots = len(metrics)+1
fig = plt.figure()
fig.set_size_inches(num_subplots*10, 8)
for i, metric_vals in enumerate(metrics_vals):
ax = fig.add_subplot(1, num_subplots, i+1)
for k, v in metric_vals.items():
ax.plot(iterations, v, label=k)
ax.set_xlim([1, num_iterations])
ax.legend()
return results
Using your sparse_mean_square_error function, write a function that builds a CFModel by creating the embedding variables and the train and test losses.
#Solution
def build_model(ratings, embedding_dim=3, init_stddev=1.):
"""
Args:
ratings: a DataFrame of the ratings
embedding_dim: the dimension of the embedding vectors.
init_stddev: float, the standard deviation of the random initial embeddings.
Returns:
model: a CFModel.
"""
# Split the ratings DataFrame into train and test.
train_ratings, test_ratings = split_dataframe(ratings)
# SparseTensor representation of the train and test datasets.
A_train = build_rating_sparse_tensor(train_ratings)
A_test = build_rating_sparse_tensor(test_ratings)
# Initialize the embeddings using a normal distribution.
U = tf.Variable(tf.random.normal(
[A_train.dense_shape[0], embedding_dim], stddev=init_stddev))
V = tf.Variable(tf.random.normal(
[A_train.dense_shape[1], embedding_dim], stddev=init_stddev))
train_loss = sparse_mean_square_error(A_train, U, V)
test_loss = sparse_mean_square_error(A_test, U, V)
metrics = {
'train_error': train_loss,
'test_error': test_loss
}
embeddings = {
"user_id": U,
"movie_id": V
}
return CFModel(embeddings, train_loss, [metrics])
Great, now it's time to train the model!
Go ahead and run the next cell, trying different parameters (embedding dimension, learning rate, iterations). The training and test errors are plotted at the end of training. You can inspect these values to validate the hyper-parameters.
Note: by calling model.train again, the model will continue training starting from the current values of the embeddings.
# Build the CF model and train it.
model = build_model(ratings, embedding_dim=30, init_stddev=0.5)
model.train(num_iterations=1000, learning_rate=10.)
2021-09-19 04:35:05.880568: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
iteration 1000: train_error=0.373024, test_error=1.349209
[{'train_error': 0.37302384, 'test_error': 1.3492094}]
The movie and user embeddings are also displayed in the right figure. When the embedding dimension is greater than 3, the embeddings are projected on the first 3 dimensions. The next section will have a more detailed look at the embeddings.
In this section, we take a closer look at the learned embeddings, by
We start by writing a function that, given a query embedding $u \in \mathbb R^d$ and item embeddings $V \in \mathbb R^{N \times d}$, computes the item scores.
As discussed in the lecture, there are different similarity measures we can use, and these can yield different results. We will compare the following:
Hints:
np.dot to compute the product of two np.Arrays.np.linalg.norm to compute the norm of a np.Array.DOT = 'dot'
COSINE = 'cosine'
def compute_scores(query_embedding, item_embeddings, measure=DOT):
"""Computes the scores of the candidates given a query.
Args:
query_embedding: a vector of shape [k], representing the query embedding.
item_embeddings: a matrix of shape [N, k], such that row i is the embedding
of item i.
measure: a string specifying the similarity measure to be used. Can be
either DOT or COSINE.
Returns:
scores: a vector of shape [N], such that scores[i] is the score of item i.
"""
u = query_embedding
V = item_embeddings
if measure == COSINE:
V = V / np.linalg.norm(V, axis=1, keepdims=True)
u = u / np.linalg.norm(u)
scores = u.dot(V.T)
return scores
Equipped with this function, we can compute recommendations, where the query embedding can be either a user embedding or a movie embedding.
def user_recommendations(model, measure=DOT, exclude_rated=False, k=6):
if USER_RATINGS:
scores = compute_scores(
model.embeddings["user_id"][943], model.embeddings["movie_id"], measure)
score_key = measure + ' score'
df = pd.DataFrame({
score_key: list(scores),
'movie_id': movies['movie_id'],
'titles': movies['title'],
'genres': movies['all_genres'],
})
if exclude_rated:
# remove movies that are already rated
rated_movies = ratings[ratings.user_id == "943"]["movie_id"].values
df = df[df.movie_id.apply(lambda movie_id: movie_id not in rated_movies)]
display.display(df.sort_values([score_key], ascending=False).head(k))
def movie_neighbors(model, title_substring, measure=DOT, k=6):
# Search for movie ids that match the given substring.
ids = movies[movies['title'].str.contains(title_substring)].index.values
titles = movies.iloc[ids]['title'].values
if len(titles) == 0:
raise ValueError("Found no movies with title %s" % title_substring)
print("Nearest neighbors of : %s." % titles[0])
if len(titles) > 1:
print("[Found more than one matching movie. Other candidates: {}]".format(
", ".join(titles[1:])))
movie_id = ids[0]
scores = compute_scores(
model.embeddings["movie_id"][movie_id], model.embeddings["movie_id"],
measure)
score_key = measure + ' score'
df = pd.DataFrame({
score_key: list(scores),
'titles': movies['title'],
'genres': movies['all_genres']
})
display.display(df.sort_values([score_key], ascending=False).head(k))
Let's look at the neareast neighbors for some of the movies.
movie_neighbors(model, "Aladdin", DOT)
movie_neighbors(model, "Aladdin", COSINE)
Nearest neighbors of : Aladdin (1992). [Found more than one matching movie. Other candidates: Aladdin and the King of Thieves (1996)]
| dot score | titles | genres | |
|---|---|---|---|
| 94 | 6.565 | Aladdin (1992) | Animation-Children-Comedy-Musical |
| 419 | 5.965 | Alice in Wonderland (1951) | Animation-Children-Musical |
| 1655 | 5.948 | Little City (1998) | Comedy-Romance |
| 1630 | 5.945 | Slingshot, The (1993) | Comedy-Drama |
| 49 | 5.535 | Star Wars (1977) | Action-Adventure-Romance-Sci-Fi-War |
| 227 | 5.327 | Star Trek: The Wrath of Khan (1982) | Action-Adventure-Sci-Fi |
Nearest neighbors of : Aladdin (1992). [Found more than one matching movie. Other candidates: Aladdin and the King of Thieves (1996)]
| cosine score | titles | genres | |
|---|---|---|---|
| 94 | 1.000 | Aladdin (1992) | Animation-Children-Comedy-Musical |
| 731 | 0.832 | Dave (1993) | Comedy-Romance |
| 70 | 0.804 | Lion King, The (1994) | Animation-Children-Musical |
| 0 | 0.802 | Toy Story (1995) | Animation-Children-Comedy |
| 419 | 0.778 | Alice in Wonderland (1951) | Animation-Children-Musical |
| 209 | 0.775 | Indiana Jones and the Last Crusade (1989) | Action-Adventure |
It seems that the quality of learned embeddings may not be very good. Can you think of potential techniques that could be used to improve them? We can start by inspecting the embeddings.
We can also observe that the recommendations with dot-product and cosine are different: with dot-product, the model tends to recommend popular movies. This can be explained by the fact that in matrix factorization models, the norm of the embedding is often correlated with popularity (popular movies have a larger norm), which makes it more likely to recommend more popular items. We can confirm this hypothesis by sorting the movies by their embedding norm, as done in the next cell.
def movie_embedding_norm(models):
"""Visualizes the norm and number of ratings of the movie embeddings.
Args:
model: A MFModel object.
"""
if not isinstance(models, list):
models = [models]
df = pd.DataFrame({
'title': movies['title'],
'genre': movies['genre'],
'num_ratings': movies_ratings['rating count'],
})
charts = []
brush = alt.selection_interval()
for i, model in enumerate(models):
norm_key = 'norm'+str(i)
df[norm_key] = np.linalg.norm(model.embeddings["movie_id"], axis=1)
nearest = alt.selection(
type='single', encodings=['x', 'y'], on='mouseover', nearest=True,
empty='none')
base = alt.Chart().mark_circle().encode(
x='num_ratings',
y=norm_key,
color=alt.condition(brush, alt.value('#4c78a8'), alt.value('lightgray'))
).properties(
selection=nearest).add_selection(brush)
text = alt.Chart().mark_text(align='center', dx=5, dy=-5).encode(
x='num_ratings', y=norm_key,
text=alt.condition(nearest, 'title', alt.value('')))
charts.append(alt.layer(base, text))
return alt.hconcat(*charts, data=df)
def visualize_movie_embeddings(data, x, y):
nearest = alt.selection(
type='single', encodings=['x', 'y'], on='mouseover', nearest=True,
empty='none')
base = alt.Chart().mark_circle().encode(
x=x,
y=y,
color=alt.condition(genre_filter, "genre", alt.value("whitesmoke")),
).properties(
width=600,
height=600,
selection=nearest)
text = alt.Chart().mark_text(align='left', dx=5, dy=-5).encode(
x=x,
y=y,
text=alt.condition(nearest, 'title', alt.value('')))
return alt.hconcat(alt.layer(base, text), genre_chart, data=data)
def tsne_movie_embeddings(model):
"""Visualizes the movie embeddings, projected using t-SNE with Cosine measure.
Args:
model: A MFModel object.
"""
tsne = sklearn.manifold.TSNE(
n_components=2, perplexity=40, metric='cosine', early_exaggeration=10.0,
init='pca', verbose=True, n_iter=400)
print('Running t-SNE...')
V_proj = tsne.fit_transform(model.embeddings["movie_id"])
movies.loc[:,'x'] = V_proj[:, 0]
movies.loc[:,'y'] = V_proj[:, 1]
return visualize_movie_embeddings(movies, 'x', 'y')
movie_embedding_norm(model)
Note: Depending on how the model is initialized, you may observe that some niche movies (ones with few ratings) have a high norm, leading to spurious recommendations. This can happen if the embedding of that movie happens to be initialized with a high norm. Then, because the movie has few ratings, it is infrequently updated, and can keep its high norm. This can be alleviated by using regularization.
Try changing the value of the hyperparameter init_stddev. One quantity that can be helpful is that the expected norm of a $d$-dimensional vector with entries $\sim \mathcal N(0, \sigma^2)$ is approximatley $\sigma \sqrt d$.
How does this affect the embedding norm distribution, and the ranking of the top-norm movies?
model_lowinit = build_model(ratings, embedding_dim=30, init_stddev=0.05)
model_lowinit.train(num_iterations=1000, learning_rate=10.)
movie_neighbors(model_lowinit, "Aladdin", DOT)
movie_neighbors(model_lowinit, "Aladdin", COSINE)
movie_embedding_norm([model, model_lowinit])
iteration 1000: train_error=0.356038, test_error=0.953454Nearest neighbors of : Aladdin (1992). [Found more than one matching movie. Other candidates: Aladdin and the King of Thieves (1996)]
| dot score | titles | genres | |
|---|---|---|---|
| 94 | 5.726 | Aladdin (1992) | Animation-Children-Comedy-Musical |
| 63 | 5.027 | Shawshank Redemption, The (1994) | Drama |
| 0 | 4.932 | Toy Story (1995) | Animation-Children-Comedy |
| 21 | 4.766 | Braveheart (1995) | Action-Drama-War |
| 49 | 4.762 | Star Wars (1977) | Action-Adventure-Romance-Sci-Fi-War |
| 70 | 4.704 | Lion King, The (1994) | Animation-Children-Musical |
Nearest neighbors of : Aladdin (1992). [Found more than one matching movie. Other candidates: Aladdin and the King of Thieves (1996)]
| cosine score | titles | genres | |
|---|---|---|---|
| 94 | 1.000 | Aladdin (1992) | Animation-Children-Comedy-Musical |
| 238 | 0.868 | Sneakers (1992) | Crime-Drama-Sci-Fi |
| 1269 | 0.849 | Life with Mikey (1993) | Comedy |
| 1077 | 0.840 | Oliver & Company (1988) | Animation-Children |
| 1144 | 0.838 | Blue Chips (1994) | Drama |
| 962 | 0.832 | Some Folks Call It a Sling Blade (1993) | Drama-Thriller |
Since it is hard to visualize embeddings in a higher-dimensional space (when the embedding dimension $k > 3$), one approach is to project the embeddings to a lower dimensional space. T-SNE (T-distributed Stochastic Neighbor Embedding) is an algorithm that projects the embeddings while attempting to preserve their pariwise distances. It can be useful for visualization, but one should use it with care. For more information on using t-SNE, see How to Use t-SNE Effectively.
tsne_movie_embeddings(model_lowinit)
Running t-SNE... [t-SNE] Computing 121 nearest neighbors... [t-SNE] Indexed 1682 samples in 0.001s... [t-SNE] Computed neighbors for 1682 samples in 0.147s...
/opt/conda/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:699: FutureWarning: 'square_distances' has been introduced in 0.24 to help phase out legacy squaring behavior. The 'legacy' setting will be removed in 1.1 (renaming of 0.26), and the default setting will be changed to True. In 1.3, 'square_distances' will be removed altogether, and distances will be squared by default. Set 'square_distances'=True to silence this warning. FutureWarning
[t-SNE] Computed conditional probabilities for sample 1000 / 1682 [t-SNE] Computed conditional probabilities for sample 1682 / 1682 [t-SNE] Mean sigma: 0.117384 [t-SNE] KL divergence after 250 iterations with early exaggeration: 57.296547 [t-SNE] KL divergence after 400 iterations: 2.219639
You can highlight the embeddings of a given genre by clicking on the genres panel (SHIFT+click to select multiple genres).
We can observe that the embeddings do not seem to have any notable structure, and the embeddings of a given genre are located all over the embedding space. This confirms the poor quality of the learned embeddings. One of the main reasons is that we only trained the model on observed pairs, and without regularization.
In this section, we will train a simple softmax model that predicts whether a given user has rated a movie.
The model will take as input a feature vector $x$ representing the list of movies the user has rated. We start from the ratings DataFrame, which we group by user_id.
rated_movies = (ratings[["user_id", "movie_id"]]
.groupby("user_id", as_index=False)
.aggregate(lambda x: list(x)))
rated_movies.head()
| user_id | movie_id | |
|---|---|---|
| 0 | 0 | [60, 188, 32, 159, 19, 201, 170, 264, 154, 116... |
| 1 | 1 | [291, 250, 49, 313, 296, 289, 311, 280, 12, 27... |
| 2 | 10 | [110, 557, 731, 226, 424, 739, 722, 37, 724, 1... |
| 3 | 100 | [828, 303, 595, 221, 470, 404, 280, 251, 281, ... |
| 4 | 101 | [767, 822, 69, 514, 523, 321, 624, 160, 447, 4... |
We then create a function that generates an example batch, such that each example contains the following features:
#@title Batch generation code (run this cell)
years_dict = {
movie: year for movie, year in zip(movies["movie_id"], movies["year"])
}
genres_dict = {
movie: genres.split('-')
for movie, genres in zip(movies["movie_id"], movies["all_genres"])
}
def make_batch(ratings, batch_size):
"""Creates a batch of examples.
Args:
ratings: A DataFrame of ratings such that examples["movie_id"] is a list of
movies rated by a user.
batch_size: The batch size.
"""
def pad(x, fill):
return pd.DataFrame.from_dict(x).fillna(fill).values
movie = []
year = []
genre = []
label = []
for movie_ids in ratings["movie_id"].values:
movie.append(movie_ids)
genre.append([x for movie_id in movie_ids for x in genres_dict[movie_id]])
year.append([years_dict[movie_id] for movie_id in movie_ids])
label.append([int(movie_id) for movie_id in movie_ids])
features = {
"movie_id": pad(movie, ""),
"year": pad(year, ""),
"genre": pad(genre, ""),
"label": pad(label, -1)
}
batch = (
tf.data.Dataset.from_tensor_slices(features)
.shuffle(1000)
.repeat()
.batch(batch_size)
.make_one_shot_iterator()
.get_next())
return batch
def select_random(x):
"""Selectes a random elements from each row of x."""
def to_float(x):
return tf.cast(x, tf.float32)
def to_int(x):
return tf.cast(x, tf.int64)
batch_size = tf.shape(x)[0]
rn = tf.range(batch_size)
nnz = to_float(tf.count_nonzero(x >= 0, axis=1))
rnd = tf.random_uniform([batch_size])
ids = tf.stack([to_int(rn), to_int(nnz * rnd)], axis=1)
return to_int(tf.gather_nd(x, ids))
Recall that the softmax model maps the input features $x$ to a user embedding $\psi(x) \in \mathbb R^d$, where $d$ is the embedding dimension. This vector is then multiplied by a movie embedding matrix $V \in \mathbb R^{m \times d}$ (where $m$ is the number of movies), and the final output of the model is the softmax of the product $$ \hat p(x) = \text{softmax}(\psi(x) V^\top). $$ Given a target label $y$, if we denote by $p = 1_y$ a one-hot encoding of this target label, then the loss is the cross-entropy between $\hat p(x)$ and $p$.
In this exercise, we will write a function that takes tensors representing the user embeddings $\psi(x)$, movie embeddings $V$, target label $y$, and return the cross-entropy loss.
Hint: You can use the function tf.nn.sparse_softmax_cross_entropy_with_logits, which takes logits as input, where logits refers to the product $\psi(x) V^\top$.
#Solution
def softmax_loss(user_embeddings, movie_embeddings, labels):
"""Returns the cross-entropy loss of the softmax model.
Args:
user_embeddings: A tensor of shape [batch_size, embedding_dim].
movie_embeddings: A tensor of shape [num_movies, embedding_dim].
labels: A tensor of [batch_size], such that labels[i] is the target label
for example i.
Returns:
The mean cross-entropy loss.
"""
# Verify that the embddings have compatible dimensions
user_emb_dim = user_embeddings.shape[1].value
movie_emb_dim = movie_embeddings.shape[1].value
if user_emb_dim != movie_emb_dim:
raise ValueError(
"The user embedding dimension %d should match the movie embedding "
"dimension % d" % (user_emb_dim, movie_emb_dim))
logits = tf.matmul(user_embeddings, movie_embeddings, transpose_b=True)
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=logits, labels=labels))
return loss
We are now ready to build a softmax CFModel. Complete the build_softmax_model function in the next cell. The architecture of the model is defined in the function create_user_embeddings and illustrated in the figure below. The input embeddings (movie_id, genre and year) are concatenated to form the input layer, then we have hidden layers with dimensions specified by the hidden_dims argument. Finally, the last hidden layer is multiplied by the movie embeddings to obtain the logits layer. For the target label, we will use a randomly-sampled movie_id from the list of movies the user rated.

Complete the function below by creating the feature columns and embedding columns, then creating the loss tensors both for the train and test sets (using the softmax_loss function of the previous exercise).
# Solution
def build_softmax_model(rated_movies, embedding_cols, hidden_dims):
"""Builds a Softmax model for MovieLens.
Args:
rated_movies: DataFrame of traing examples.
embedding_cols: A dictionary mapping feature names (string) to embedding
column objects. This will be used in tf.feature_column.input_layer() to
create the input layer.
hidden_dims: int list of the dimensions of the hidden layers.
Returns:
A CFModel object.
"""
def create_network(features):
"""Maps input features dictionary to user embeddings.
Args:
features: A dictionary of input string tensors.
Returns:
outputs: A tensor of shape [batch_size, embedding_dim].
"""
# Create a bag-of-words embedding for each sparse feature.
inputs = tf.feature_column.input_layer(features, embedding_cols)
# Hidden layers.
input_dim = inputs.shape[1].value
for i, output_dim in enumerate(hidden_dims):
w = tf.get_variable(
"hidden%d_w_" % i, shape=[input_dim, output_dim],
initializer=tf.truncated_normal_initializer(
stddev=1./np.sqrt(output_dim))) / 10.
outputs = tf.matmul(inputs, w)
input_dim = output_dim
inputs = outputs
return outputs
train_rated_movies, test_rated_movies = split_dataframe(rated_movies)
train_batch = make_batch(train_rated_movies, 200)
test_batch = make_batch(test_rated_movies, 100)
with tf.variable_scope("model", reuse=False):
# Train
train_user_embeddings = create_network(train_batch)
train_labels = select_random(train_batch["label"])
with tf.variable_scope("model", reuse=True):
# Test
test_user_embeddings = create_network(test_batch)
test_labels = select_random(test_batch["label"])
movie_embeddings = tf.get_variable(
"input_layer/movie_id_embedding/embedding_weights")
test_loss = softmax_loss(
test_user_embeddings, movie_embeddings, test_labels)
train_loss = softmax_loss(
train_user_embeddings, movie_embeddings, train_labels)
_, test_precision_at_10 = tf.metrics.precision_at_k(
labels=test_labels,
predictions=tf.matmul(test_user_embeddings, movie_embeddings, transpose_b=True),
k=10)
metrics = (
{"train_loss": train_loss, "test_loss": test_loss},
{"test_precision_at_10": test_precision_at_10}
)
embeddings = {"movie_id": movie_embeddings}
return CFModel(embeddings, train_loss, metrics)
We are now ready to train the softmax model. You can set the following hyperparameters:
softmax_model.train() again to continue training the model from its current state.input_dims argument)hidden_dims argument)Note: since our input features are string-valued (movie_id, genre, and year), we need to map them to integer ids. This is done using tf.feature_column.categorical_column_with_vocabulary_list, which takes a vocabulary list specifying all the values the feature can take. Then each id is mapped to an embedding vector using tf.feature_column.embedding_column.
# Create feature embedding columns
def make_embedding_col(key, embedding_dim):
categorical_col = tf.feature_column.categorical_column_with_vocabulary_list(
key=key, vocabulary_list=list(set(movies[key].values)), num_oov_buckets=0)
return tf.feature_column.embedding_column(
categorical_column=categorical_col, dimension=embedding_dim,
# default initializer: trancated normal with stddev=1/sqrt(dimension)
combiner='mean')
with tf.Graph().as_default():
softmax_model = build_softmax_model(
rated_movies,
embedding_cols=[
make_embedding_col("movie_id", 35),
make_embedding_col("genre", 3),
make_embedding_col("year", 2),
],
hidden_dims=[35])
softmax_model.train(
learning_rate=8., num_iterations=3000, optimizer=tf.train.AdagradOptimizer)
WARNING:tensorflow:From /tmp/ipykernel_19533/351878896.py:39: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version. Instructions for updating: This is a deprecated API that should only be used in TF 1 graph mode and legacy TF 2 graph mode available through `tf.compat.v1`. In all other situations -- namely, eager mode and inside `tf.function` -- you can consume dataset elements using `for elem in dataset: ...` or by explicitly creating iterator via `iterator = iter(dataset)` and fetching its elements via `values = next(iterator)`. Furthermore, this API is not available in TF 2. During the transition from TF 1 to TF 2 you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)` to create a TF 1 graph mode style iterator for a dataset created through TF 2 APIs. Note that this should be a transient state of your code base as there are in general no guarantees about the interoperability of TF 1 and TF 2 code. WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column.py:205: EmbeddingColumn._get_dense_tensor (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column_v2.py:3078: VocabularyListCategoricalColumn._get_sparse_tensors (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column.py:2192: VocabularyListCategoricalColumn._transform_feature (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column_v2.py:3018: VocabularyListCategoricalColumn._num_buckets (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/feature_column/feature_column.py:206: EmbeddingColumn._variable_shape (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version. Instructions for updating: The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead. WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/training/adagrad.py:77: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor iteration 3000: train_loss=5.422085, test_loss=5.970283, test_precision_at_10=0.010804
({'train_loss': 5.422085, 'test_loss': 5.9702826},
{'test_precision_at_10': 0.010803732089303566})
We can inspect the movie embeddings as we did for the previous models. Note that in this case, the movie embeddings are used at the same time as input embeddings (for the bag of words representation of the user history), and as softmax weights.
movie_neighbors(softmax_model, "Aladdin", DOT)
movie_neighbors(softmax_model, "Aladdin", COSINE)
Nearest neighbors of : Aladdin (1992). [Found more than one matching movie. Other candidates: Aladdin and the King of Thieves (1996)]
| dot score | titles | genres | |
|---|---|---|---|
| 94 | 21.258 | Aladdin (1992) | Animation-Children-Comedy-Musical |
| 587 | 18.576 | Beauty and the Beast (1991) | Animation-Children-Musical |
| 173 | 18.526 | Raiders of the Lost Ark (1981) | Action-Adventure |
| 0 | 17.836 | Toy Story (1995) | Animation-Children-Comedy |
| 49 | 17.380 | Star Wars (1977) | Action-Adventure-Romance-Sci-Fi-War |
| 63 | 17.164 | Shawshank Redemption, The (1994) | Drama |
Nearest neighbors of : Aladdin (1992). [Found more than one matching movie. Other candidates: Aladdin and the King of Thieves (1996)]
| cosine score | titles | genres | |
|---|---|---|---|
| 94 | 1.000 | Aladdin (1992) | Animation-Children-Comedy-Musical |
| 587 | 0.866 | Beauty and the Beast (1991) | Animation-Children-Musical |
| 419 | 0.806 | Alice in Wonderland (1951) | Animation-Children-Musical |
| 70 | 0.793 | Lion King, The (1994) | Animation-Children-Musical |
| 90 | 0.766 | Nightmare Before Christmas, The (1993) | Children-Comedy-Musical |
| 626 | 0.755 | Robin Hood: Prince of Thieves (1991) | Drama |
movie_embedding_norm(softmax_model)
tsne_movie_embeddings(softmax_model)
Running t-SNE... [t-SNE] Computing 121 nearest neighbors... [t-SNE] Indexed 1682 samples in 0.000s... [t-SNE] Computed neighbors for 1682 samples in 0.117s... [t-SNE] Computed conditional probabilities for sample 1000 / 1682
/opt/conda/lib/python3.7/site-packages/sklearn/manifold/_t_sne.py:699: FutureWarning: 'square_distances' has been introduced in 0.24 to help phase out legacy squaring behavior. The 'legacy' setting will be removed in 1.1 (renaming of 0.26), and the default setting will be changed to True. In 1.3, 'square_distances' will be removed altogether, and distances will be squared by default. Set 'square_distances'=True to silence this warning. FutureWarning
[t-SNE] Computed conditional probabilities for sample 1682 / 1682 [t-SNE] Mean sigma: 0.188410 [t-SNE] KL divergence after 250 iterations with early exaggeration: 53.912308 [t-SNE] KL divergence after 400 iterations: 1.266384